AITopics | semantic anchor

Collaborating Authors

semantic anchor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d75f561eaaf2cb754bc8d7e36d8af362-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 04:50:09 GMT

domain generalization, generalization, representation, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Zhang, Jusheng, Guo, Xiaoyang, Cai, Kaitong, Lv, Qinhan, Fan, Yijia, Chai, Wenhao, Wang, Jian, Wang, Keze

arXiv.org Artificial IntelligenceDec-10-2025

Vision-language models (VLMs) have transformed multimodal reasoning, but feeding hundreds of visual patch tokens into LLMs incurs quadratic computational costs, straining memory and context windows. Traditional approaches face a trade-off: continuous compression dilutes high-level semantics such as object identities, while discrete quantization loses fine-grained details such as textures. We introduce HTC-VLM, a hybrid framework that disentangles semantics and appearance through dual channels, i.e., a continuous pathway for fine-grained details via ViT patches and a discrete pathway for symbolic anchors using MGVQ quantization projected to four tokens. These are fused into a 580-token hybrid sequence and compressed into a single voco token via a disentanglement attention mask and bottleneck, ensuring efficient and grounded representations. HTC-VLM achieves an average performance retention of 87.2 percent across seven benchmarks (GQA, VQAv2, MMBench, MME, POPE, SEED-Bench, ScienceQA-Image), outperforming the leading continuous baseline at 81.0 percent with a 580-to-1 compression ratio. Attention analyses show that the compressed token prioritizes the discrete anchor, validating its semantic guidance. Our work demonstrates that a minimalist hybrid design can resolve the efficiency-fidelity dilemma and advance scalable VLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.0824

Country:

Europe > Austria > Vienna (0.14)
North America > United States (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models

Jeong, Geunyeong, Sun, Juoh, Lee, Seonghee, Kim, Harksoo

arXiv.org Artificial IntelligenceOct-14-2025

Large Language Models store extensive factual knowledge acquired during large-scale pre-training. However, this knowledge is inherently static, reflecting only the state of the world at the time of training. Knowledge editing has emerged as a promising solution for updating outdated or incorrect facts without full retraining. However, most existing locate-and-edit methods primarily focus on token-level likelihood optimization without addressing semantic coherence. Our analysis reveals that such edited knowledge is often encoded as isolated residual streams in the model's latent space, distinct from pre-existing knowledge and bypassing natural reasoning process. To address this, we propose \textsc{Steam}, a semantic-level knowledge editing framework that enhances integration of updated knowledge into the model's knowledge structure. \textsc{Steam} first identifies target representations as semantic anchors for the updated factual association, then guides the internal representation of the edited fact towards these anchors through an alignment loss during optimization. Experimental results demonstrate that \textsc{Steam} improves model's ability to reason with edited knowledge and enhances semantic coherence, underscoring the importance of latent-space alignment for reliable and coherent knowledge editing. The code is available at https://github.com/GY-Jeong/STEAM.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.10398

Country:

Asia > Singapore (0.04)
North America > United States (0.04)
North America > Dominican Republic (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Opening the Black Box: Interpretable LLMs via Semantic Resonance Architecture

Ternovtsii, Ivan

arXiv.org Artificial IntelligenceSep-19-2025

Large language models (LLMs) achieve remarkable performance but remain difficult to interpret. Mixture-of-Experts (MoE) models improve efficiency through sparse activation, yet typically rely on opaque, learned gating functions. While similarity-based routing (Cosine Routers) has been explored for training stabilization, its potential for inherent interpretability remains largely untapped. We introduce the Semantic Resonance Architecture (SRA), an MoE approach designed to ensure that routing decisions are inherently interpretable. SRA replaces learned gating with a Chamber of Semantic Resonance (CSR) module, which routes tokens based on cosine similarity with trainable semantic anchors. We also introduce a novel Dispersion Loss that encourages orthogonality among anchors to enforce diverse specialization. Experiments on WikiText-103 demonstrate that SRA achieves a validation perplexity of 13.41, outperforming both a dense baseline (14.13) and a Standard MoE baseline (13.53) under matched active parameter constraints (29.0M). Crucially, SRA exhibits superior expert utilization (1.0% dead experts vs. 14.8% in the Standard MoE) and develops distinct, semantically coherent specialization patterns, unlike the noisy specialization observed in standard MoEs. This work establishes semantic routing as a robust methodology for building more transparent and controllable language models.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2509.14255

Country:

Europe > Ukraine (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Transportation > Air (0.41)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

Structures Meet Semantics: Multimodal Fusion via Graph Contrastive Learning

Sun, Jiangfeng, He, Sihao, Ou, Zhonghong, Song, Meina

arXiv.org Artificial IntelligenceAug-27-2025

Multimodal sentiment analysis (MSA) aims to infer emotional states by effectively integrating textual, acoustic, and visual modalities. Despite notable progress, existing multimodal fusion methods often neglect modality-specific structural dependencies and semantic misalignment, limiting their quality, interpretability, and robustness. To address these challenges, we propose a novel framework called the Structural-Semantic Unifier (SSU), which systematically integrates modality-specific structural information and cross-modal semantic grounding for enhanced multimodal representations. Specifically, SSU dynamically constructs modality-specific graphs by leveraging linguistic syntax for text and a lightweight, text-guided attention mechanism for acoustic and visual modalities, thus capturing detailed intra-modal relationships and semantic interactions. We further introduce a semantic anchor, derived from global textual semantics, that serves as a cross-modal alignment hub, effectively harmonizing heterogeneous semantic spaces across modalities. Additionally, we develop a multiview contrastive learning objective that promotes discriminability, semantic consistency, and structural coherence across intra- and inter-modal views. Extensive evaluations on two widely used benchmark datasets, CMU-MOSI and CMU-MOSEI, demonstrate that SSU consistently achieves state-of-the-art performance while significantly reducing computational overhead compared to prior methods. Comprehensive qualitative analyses further validate SSU's interpretability and its ability to capture nuanced emotional patterns through semantically grounded interactions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.18322

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization Chaoqi Chen 1 Luyao T ang

Neural Information Processing SystemsAug-19-2025, 07:19:37 GMT

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one.

artificial intelligence, generalization, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

From Abstraction to Reality: DARPA's Vision for Robust Sim-to-Real Autonomy

Noorani, Erfaun, Serlin, Zachary, Price, Ben, Velasquez, Alvaro

arXiv.org Artificial IntelligenceMar-13-2025

The DARPA Transfer from Imprecise and Abstract Models to Autonomous Technologies (TIAMAT) program aims to address rapid and robust transfer of autonomy technologies across dynamic and complex environments, goals, and platforms. Existing methods for simulation-to-reality (sim-to-real) transfer often rely on high-fidelity simulations and struggle with broad adaptation, particularly in time-sensitive scenarios. Although many approaches have shown incredible performance at specific tasks, most techniques fall short when posed with unforeseen, complex, and dynamic real-world scenarios due to the inherent limitations of simulation. In contrast to current research that aims to bridge the gap between simulation environments and the real world through increasingly sophisticated simulations and a combination of methods typically assuming a small sim-to-real gap -- such as domain randomization, domain adaptation, imitation learning, meta-learning, policy distillation, and dynamic optimization -- TIAMAT takes a different approach by instead emphasizing transfer and adaptation of the autonomy stack directly to real-world environments by utilizing a breadth of low(er)-fidelity simulations to create broadly effective sim-to-real transfers. By abstractly learning from multiple simulation environments in reference to their shared semantics, TIAMAT's approaches aim to achieve abstract-to-real transfer for effective and rapid real-world adaptation. Furthermore, this program endeavors to improve the overall autonomy pipeline by addressing the inherent challenges in translating simulated behaviors into effective real-world performance.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.11007

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Add feedback

FedSA: A Unified Representation Learning via Semantic Anchors for Prototype-based Federated Learning

Zhou, Yanbing, Qu, Xiangmou, You, Chenlong, Zhou, Jiyang, Tang, Jingyue, Zheng, Xin, Cai, Chunmao, Wu, Yingbo

arXiv.org Artificial IntelligenceJan-9-2025

Prototype-based federated learning has emerged as a promising approach that shares lightweight prototypes to transfer knowledge among clients with data heterogeneity in a model-agnostic manner. However, existing methods often collect prototypes directly from local models, which inevitably introduce inconsistencies into representation learning due to the biased data distributions and differing model architectures among clients. In this paper, we identify that both statistical and model heterogeneity create a vicious cycle of representation inconsistency, classifier divergence, and skewed prototype alignment, which negatively impacts the performance of clients. To break the vicious cycle, we propose a novel framework named Federated Learning via Semantic Anchors (FedSA) to decouple the generation of prototypes from local representation learning. We introduce a novel perspective that uses simple yet effective semantic anchors serving as prototypes to guide local models in learning consistent representations. By incorporating semantic anchors, we further propose anchor-based regularization with margin-enhanced contrastive learning and anchor-based classifier calibration to correct feature extractors and calibrate classifiers across clients, achieving intra-class compactness and inter-class separability of prototypes while ensuring consistent decision boundaries. We then update the semantic anchors with these consistent and discriminative prototypes, which iteratively encourage clients to collaboratively learn a unified data representation with robust generalization. Extensive experiments under both statistical and model heterogeneity settings show that FedSA significantly outperforms existing prototype-based FL methods on various classification tasks.

artificial intelligence, machine learning, prototype, (15 more...)

arXiv.org Artificial Intelligence

2501.05496

Country: Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report (0.84)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Malard, Hugo, Olvera, Michel, Lathuiliere, Stephane, Essid, Slim

arXiv.org Artificial IntelligenceDec-2-2024

Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications. Here, we tackle the specific task of sound-prompted segmentation, aiming to segment image regions corresponding to objects heard in an audio signal. Most existing approaches tackle this problem by fine-tuning pre-trained models or by training additional modules specifically for the task. We adopt a different strategy: we introduce a training-free approach that leverages Non-negative Matrix Factorization (NMF) to co-factorize audio and visual features from pre-trained models to reveal shared interpretable concepts. These concepts are passed to an open-vocabulary segmentation model for precise segmentation maps. By using frozen pre-trained models, our method achieves high generalization and establishes state-of-the-art performance in unsupervised sound-prompted segmentation, significantly surpassing previous unsupervised methods.

dataset, representation, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2412.01488

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

$\textbf{S}^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting

Pan, Zijie, Jiang, Yushan, Garg, Sahil, Schneider, Anderson, Nevmyvaka, Yuriy, Song, Dongjin

arXiv.org Artificial IntelligenceJul-7-2024

Recently, there has been a growing interest in leveraging pre-trained large language models (LLMs) for various time series applications. However, the semantic space of LLMs, established through the pre-training, is still underexplored and may help yield more distinctive and informative representations to facilitate time series forecasting. To this end, we propose Semantic Space Informed Prompt learning with LLM ($S^2$IP-LLM) to align the pre-trained semantic space with time series embeddings space and perform time series forecasting based on learned prompts from the joint space. We first design a tokenization module tailored for cross-modality alignment, which explicitly concatenates patches of decomposed time series components to create embeddings that effectively encode the temporal dynamics. Next, we leverage the pre-trained word token embeddings to derive semantic anchors and align selected anchors with time series embeddings by maximizing the cosine similarity in the joint space. This way, $S^2$IP-LLM can retrieve relevant semantic anchors as prompts to provide strong indicators (context) for time series that exhibit different temporal dynamics. With thorough empirical studies on multiple benchmark datasets, we demonstrate that the proposed $S^2$IP-LLM can achieve superior forecasting performance over state-of-the-art baselines. Furthermore, our ablation studies and visualizations verify the necessity of prompt learning informed by semantic space.

forecasting, semantic space informed prompt learning, time sery, (8 more...)

arXiv.org Artificial Intelligence

2403.05798

Country:

Europe > Austria > Vienna (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback